ADR for updated data model #182

jacksonj04 · 2024-11-11T09:54:34Z

Our existing data model is starting to creak under the load, and needs a bit of a refresh. This ADR proposes a new structure for the data to better support future requirements.

doc/adr/0021-update-data-structure.md

This proposes a richer data structure to help us model how the various parts of "a decision" exist within the service, both logically and conceptually.

dragon-dxw · 2024-11-14T14:10:41Z

doc/adr/0021-update-data-structure.md

+
+Documents are the overarching item in the data structure, and are what most people will actually mean when they talk about a "judgment". Each document MUST be assigned a unique, non-semantic identifier by the Find Case Law service, and may have one or more other identifiers such as NCNs.
+
+Where a relationship exists between two documents (eg "X is a press summary of Y") this relationship would likely be stored bidirectionally, ie "is summarised by" and "is a summary of" to simplify retrieval.


I think it's worth being specific that a judgment and its press summary are different documents, and that we don't yet have a settled opinion on language.

What does the relationship of revisions to its document look like?

dragon-dxw · 2024-11-14T14:11:44Z

doc/adr/0021-update-data-structure.md

+
+A revision represents a distinct submission of a document to the National Archives, usually by a court or tribunal. For new submissions this will usually be via TDR, but some legacy ingestions may have been done via other means.
+
+A revision SHOULD have a "source document" which we consider to be the canonical representation of the revision, and from which all other representations are derived. This will usually be a .docx file for all new submissions, but could also be other types of file for legacy ingestions or future submissions. It is possible that legacy ingestions will no longer have the original file available for all past revisions (although this will remain in The National Archives' preservation system).


is the source document in practice a link to S3 and maybe a hash of that file?

jacksonj04 force-pushed the adr/document-data-structure branch 2 times, most recently from d8e9977 to 992bbe1 Compare November 11, 2024 11:05

dragon-dxw reviewed Nov 11, 2024

View reviewed changes

doc/adr/0021-update-data-structure.md Show resolved Hide resolved

dragon-dxw reviewed Nov 11, 2024

View reviewed changes

doc/adr/0021-update-data-structure.md Show resolved Hide resolved

dragon-dxw reviewed Nov 11, 2024

View reviewed changes

doc/adr/0021-update-data-structure.md Outdated Show resolved Hide resolved

dragon-dxw reviewed Nov 11, 2024

View reviewed changes

doc/adr/0021-update-data-structure.md Outdated Show resolved Hide resolved

jacksonj04 force-pushed the adr/document-data-structure branch 2 times, most recently from 9007447 to 720037a Compare November 14, 2024 09:49

docs(FCL-455): add ADR 0021 on proposed updated data structure

7ede3ce

This proposes a richer data structure to help us model how the various parts of "a decision" exist within the service, both logically and conceptually.

jacksonj04 force-pushed the adr/document-data-structure branch from 720037a to 7ede3ce Compare November 14, 2024 09:54

jacksonj04 marked this pull request as ready for review November 14, 2024 09:54

dragon-dxw reviewed Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR for updated data model #182

ADR for updated data model #182

jacksonj04 commented Nov 11, 2024 •

edited

Loading

dragon-dxw Nov 14, 2024

dragon-dxw Nov 14, 2024

dragon-dxw Nov 14, 2024


		Documents are the overarching item in the data structure, and are what most people will actually mean when they talk about a "judgment". Each document MUST be assigned a unique, non-semantic identifier by the Find Case Law service, and may have one or more other identifiers such as NCNs.

		Where a relationship exists between two documents (eg "X is a press summary of Y") this relationship would likely be stored bidirectionally, ie "is summarised by" and "is a summary of" to simplify retrieval.


		A revision represents a distinct submission of a document to the National Archives, usually by a court or tribunal. For new submissions this will usually be via TDR, but some legacy ingestions may have been done via other means.

		A revision SHOULD have a "source document" which we consider to be the canonical representation of the revision, and from which all other representations are derived. This will usually be a .docx file for all new submissions, but could also be other types of file for legacy ingestions or future submissions. It is possible that legacy ingestions will no longer have the original file available for all past revisions (although this will remain in The National Archives' preservation system).

ADR for updated data model #182

Are you sure you want to change the base?

ADR for updated data model #182

Conversation

jacksonj04 commented Nov 11, 2024 • edited Loading

dragon-dxw Nov 14, 2024

Choose a reason for hiding this comment

dragon-dxw Nov 14, 2024

Choose a reason for hiding this comment

dragon-dxw Nov 14, 2024

Choose a reason for hiding this comment

jacksonj04 commented Nov 11, 2024 •

edited

Loading